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Abstract. The study of finite automata and regular languages is a privileged meeting 
point of algebra and logic. Since the work of Biichi, regular languages have been classified 
according to their descriptive complexity, i.e. the type of logical formalism required to 
define them. The algebraic point of view on automata is an essential complement of this 
classification: by providing alternative, algebraic characterizations for the classes, it often 
yields the only opportunity for the design of algorithms that decide expressibility in some 
logical fragment. 

We survey the existing results relating the expressibility of regular languages in logical 
fragments of monadic second order logic with successor with algebraic properties of their 
minimal automata. In particular, we show that many of the best known results in this 
area share the same underlying mechanics and rely on a very strong relation between 
logical substitutions and block-products of pseudovarieties of monoids. We also explain 
the impact of these connections on circuit complexity theory. 



Kleene's theorem insures that finite automata and regular expressions have the same 
expressive power and so we tend to forget that these two points of view on regular languages 
are of a different nature: regular expressions are well suited to reflect the combinatorial 
structure of a language while finite automata are first and foremost algebraic objects. It is 
intuitively clear that the combinatorial properties of a regular language should somehow be 
reflected in the structure of the corresponding automaton but this is difficult to formalize 
without resorting to algebra. 
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On one hand, the algebraic point of view on finite automata pioneered by Eilen- 
berg [Eil76j has been a driving force in our understanding of regular languages: each letter 
of an automaton's alphabet defines a transformation of the set of states and one can identify 
an automaton with its transition monoid, i.e. the finite monoid generated by these functions. 
Any regular language L can then be canonically associated with the transition monoid of 
its minimal automaton (the syntactic monoid of L) and many important classes of regular 
languages defined combinatorially can be characterized by the algebraic properties of their 
syntactic monoid. 

On the other hand, Biichi showed in 1960 that the expressive power of monadic sec- 
ond order logic with successor MSO[S] (or equivalently with order MSO[<]) was exactly 
that of finite automata |Buc60| . Since then numerous results have related the expressive 
power of various sublogics to well-known classes of regular languages. The most notable 
example of this sort concerns languages definable by a first-order sentence using order. Mc- 
Naughton and Papert showed that a regular language is definable in FO[<] if and only if it 
is star- free |MP71| . i.e. if and only if the language can be described by a regular expression 
constructed from the letters of the alphabet, the empty set symbol, concatenation, union 
and complementation. 

The result of McNaughton and Papert is non-trivial but it is stating an equivalence 
of two, not so different combinatorial descriptions of the class SF of star-free languages 
and neither of these is of any help to decide if a given language belongs to SF. This is 
precisely why the algebraic point of view on automata is so fruitful. Intuitively, the fact 
that a language is star-free should translate into structural properties of the corresponding 
automaton and indeed, an earlier result of Schiitzenberger shows that L £ SF if and only 
if its syntactic monoid contains no non-trivial group |Sch65j . This immediately provides an 
algorithm to decide definability in FO[<]. While it is fairly easy to show that a star- free 
language has a group-free syntactic monoid, the converse requires a very good understanding 
of the algebraic structure of these monoids. 

Over the last thirty years, many of the most natural fragments of MSO[<], including 
a number of temporal sublogics of LTL, have been characterized algebraically in the same 
way. It seems surprising, at first glance, that so many questions about the expressivity of 
logical fragments of MSO[<] have an algebraic answer but Straubing provided elements of 
a meta-explanation of the phenomenon |Str02] . For the majority of results simultaneously 
providing alternate logical, algebraic and combinatorial descriptions of a same class of reg- 
ular languages, the greatest challenge is to establish the bridge between the combinatorial 
or logical characterization and the algebraic one. 

One objective of the present survey is to give an overview of the existing results in this 
line of work and to provide examples illustrating the expressive power of various classes of 
logical sentences. We also want to demonstrate that our understanding of the underlying 
mechanics of the interaction between logic and algebra in this context has recently grown 
much deeper. We have a much more systematic view today of the techniques involved in 
bridging the algebraic and logical perspectives on regular languages and this seems primor- 
dial if we hope to extend them to more sophisticated contexts such as the theory of regular 
tree-languages. 

We focus particularly on one such technique known as the block-product /substitution 
principle. Substitutions are a natural logical construct: informally, a substitution replaces 
the label predicates Q a x of an MSO[<] sentence (ft by a formula with free variable x. We 
want to understand the extra expressive power afforded to a class of sentences A when we 
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substitute the label predicates of the <p in A by formulas from a class V. Under the right 
technical conditions, this logical operation can be put in correspondence with the block- 
product operation on pseudovarieties, an algebraic construct tied to bilateral semidirect 
products which reflects the combinatorial structure of substitutions. While this connection 
is not always as robust as we would hope it to be, it is still sufficient to derive the results of 
McNaughton-Papert and Schiitzenberger mentioned earlier, as well as results on temporal 
logic |CPP93l ITW981 ITW021 ITW04] . first-order sentences augmented with modular quanti- 
fiers jSTT95j and sentences with a bounded number of variables [TW98l[ST0^[ST0^[TT06] . 

Both logic and algebra have also contributed significantly in boolean circuit complexity. 
In particular, the circuit complexity classes AC , CC° and ACC° have interesting logical 
and algebraic characterizations: a language L lies in AC if and only if it definable by 
an FO sentence using arbitrary numerical predicates [GL841 IImm 87] if and only if it can 
be recognized by a polynomial-length program over a finite aperiodic monoid [BT88J. This 
makes it possible to attack questions of circuit complexity using either the logical (e.g. |Str94|. 
ILib041 ILMSVOTl |Lyn82aj ISTr92l lRS06l IKLPT06j ) or the algebraic perspective (e.g. [BST901 
IBS951 lBS99j IBouOBl IGRS051 IMPT91} IThe94| h Furth ermore, the recent res ults on the 
expressivity of two- variable logical sentences using order [EVW97[ ITW98} IST03] have found 
surprising connections with regular languages which can be recognized by bounded depth 
circuits that use only 0{n) gates or 0{n) wires [KLPT061 IKPT05] and the communication 
complexity of regular languages |TT05al ITT05b| . 

We begin by reviewing in Sections[2]and[3]the bases of the logical and algebraic approach 
to the study of regular languages and introduce the block-product/substitution principle. 
We then consider two types of applications of this principle in Sections [4] and [5] and finally 
explore some of the connections to computational complexity in Section [H 



2. Logic on Words 

We are interested in considering logical sentences describing properties of finite words 
in E*. Variables in these sentences refer to positions in a finite word. 

Example 2.1. 

Consider for instance the sentence 

4> : 3x3yVz [ (x < y) A Q a x A Q a y A [(x < z < y) =>• Q c z] j . 

We think of <p as being true on words of {a, b, c}* that have positions x and y each holding 
the letter a so that any position in between them holds a c. We can therefore think of this 
sentence as defining the regular language {a,b,c}*ac*a{a,b,c\* . 

There is a considerable amount of literature dealing with the expressive power of these 
types of logics. Straubing's book on the links between logic, algebra and circuit the- 
ory |Str94] is certainly the reference which is closest in spirit to our discussion. Other 
valuable surveys and books include |Lib041 iPmMl IPinOH ITho97j . 

More formally we construct formulas using variables corresponding to positions in a 
finite word w € £*, usual existential and universal quantifiers, the boolean constants T and 
F, and boolean connectives. Moreover, for every letter a € S, we have a unary predicate 
Q a x (the 'content' or 'label' predicate) which over a finite word w is interpreted as 'position 
x in the word w holds the letter a'. We further allow numerical predicates from some 
specified set M = {Ri, ■ ■ ■ , Rk}' the truth value of a numerical predicate Ri(xi, . . . , xtj 
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only depends on the values of the variables x% and on the lengtiQ of the string w but not on 
the actual letters in those positions and we thus formally consider Ri of arity tj as a subset 
of N'* +1 . 

A word structure over alphabet £ and variable set V = {x±, . . . ,Xk} is a pair (w, p ) 
consisting of a word w £ T,* and a list of pointers p = (pi, . . . ,pk) with 1 < pi < \w\ which 
associate each variable X{ S V with a position pj in the string. We identify the word w with 
the word structure (w, ). Following [TW04], we further define a pointed word to be a word 
structure (w,p) with a single pointer p and a pointed language to be a set of pointed words. 
Alternatively, we can view a pointed word (w,p) as a triples (u,a,v) G S* x £ x S* with 
u = ioi . . . itfp-i, a = w p and u = w p +i . . . w\ w \. Accordingly, we view pointed languages as 
subsets of S* x S x S*. 

A simple extension of a word structure (w,~p) over S,V is a word structure (w, p') 
over S, (V U {xk+i}) such that Xk+i V and pj = p\ for 1 < i < k. We can now formally 
define the semantics of our formulas in a natural way. If w = W\ . . . wt is a word and 
P = (Plj • • ■ j Pit) is a list °f pointers to w, we have 
(io, ~p ) |= Q a Xj if w Pi = a; 

(w,^) |= Rjix^, . . . ,x it .) if (pi x ,...,Pi t .,\w\) E Rf, 

(w, ~p) \= 3xk+i(4>(xk+i)) if there exists a simple extension (w, p ) of (to, "p^) 

such that (w, p') |= 0(xfc + i); 

(w, "p') |= V3;fc+i(</>(xfc + i)) if (u>, p') |= for all simple extensions (w, p'). 

If cj) is a sentence, i.e. a formula with no free variable, we denote as C S* the language 
= {w : (if, ) |=0}. Similarly, formulas naturally define a set of word structures and 
it is often useful to consider the special case of formulas with a single free variable. Such a 
formula defines a set of pointed words (w,p) with 1 < p < \w\, i.e. a pointed language. For 
any formula <j) having a single free variable and $ a class of such formulas, we denote as P<f, 
the pointed language = {(w,p) : (w,p) \= (j)} and P(<3?) the class of all Pfj, with </>€<!>. 

For a set of numerical predicates J\f, we denote as FO[AA] both the class of first-order 
sentences constructed with predicates in Af and, with a slight abuse of notation, the class 
L(FO[A/]) of languages definable by such sentences. The expressive power of this logic is of 
course highly dependent on the choice of numerical predicates used. In particular, various 
results mentioned in our introduction can be combined to obtain: 

Theorem 2.2. A language L is 

• definable in FO[<] if and only if L is a starfree regular language |MP71| : 

• definable in FO[*, +] (addition and multiplication) if and only if L lies in the boolean 
circuit complexity class DLOGTlME-uniform AC [BIS90J (see Section^); 

• definable in FO with no restriction on the class of numerical predicates used if and 
only if L lies in non-uniform AC . 



1 Allowing the truth value of numerical predicates to also depend on the length of w might seem non- 
standard. It is equivalent to assuming that formulas have access to the constant max and since this constant 
is easily definable in first-order, it would appear that this relaxed definition of numerical predicates is 
unnecessary. However max cannot be defined in very weak fragments of FO or in logics using modular 
quantifiers. In these cases the connection of logic to circuit complexity (see Section|SJ is best preserved with 
this slightly more general formulation. 
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There is a considerable body of work concerning the case where the available numerical 
predicates are order (j), successor (S) or both. Of course FO[5] is contained in FO[<] = 
FO[<,iS] and that containment is known to be proper [Tho82| . In turn, FO[<] is clearly 
contained in MSOfS 1 ] and Biichi's theorem thus guarantees that all languages definable in 
these first-order fragments are regular. 

One can further augment the expressive power of first-order sentences by introduc- 
ing modular quantifiers J iraodm x <j)(x) (for some m > 2 and i < m — 1). Intuitively 
zjimodm^ (frfe) holds true if property 4> is true for i modulo m positions x. Formally 

(w, ~p) \= 3 tmodrn xi t+ i (4>(xk+i)) if there exists i modulo m extensions (w, p') 

of (w, p ) such that (w, p') \= <p(xk+\). 
The next three examples will serve to illustrate results of the later sections. 

Example 2.3. 

The sentence 

3 0mod2 x3y (Q a xA(y<x)AQ b yA[Vz ((y < z < x) => Q c z)} ) 

holds true for words over the alphabet X = {a, b, c, d} in which there are an even number 
of positions x holding an a and whose prefix lies in T,*bc*. The sentence thus defines the 
regular language 

[(dc*a U c U b)*bc*a(dc*a U c U b)*bc*a]*(dc*a U c U 6)*. 

Example 2.4. 

The regular language K = (&*a6*a)*6£* over the alphabet S = {a, b} is defined by the 
sentence 

3* (Q b xA3 0mod2 y[(y<x)AQ a y}} . 

Example 2.5. 

The sentence 

3x\/y [Q a x A [(y < x) => ^Q a y] A 3 0mod2 z [(x < z) A Q c z] ) 

is true of words over the alphabet {a, b, c} such that the position x holding the first a has 
a suffix containing an even number of c's. Thus the language defined is 

{b, c}*a({a, b}*c{a, b}*c{a, b}*)*. 

We denote as FO + MOD[7V] the class of first-order sentences constructed with the 
content predicates and numerical predicates in Af and with existential, universal and mod- 
ular quantifiers. We also denote as MOD[jV], the class of sentences in which only modular 
quantifiers are used. Once again Biichi's theorem guarantees that L(FO + MOD[<]) con- 
tains only regular languages because the modular quantifiers can be simulated in monadic 
second order. 

Definition 2.6. Let S be an alphabet and $ = {(f>i(x), . . . 4>k( x )} be a set formulas over E 
with at moslH one free variable, say x. A ^-substitution a over S is a function mapping any 
sentence ip over the alphabet 2* (the power set of <£) to a sentence cr(ip) over the alphabet 
S as follows. We assume without loss of generality that the set of variables used in ifi is 
disjoint from the set of variables in any and replace each occurrence of the predicate QsU 
in V with S C $ by the conjunction A&(*)eS A A&(*)0S ^Mv)- 



It might be that some of the cj>i contain no occurrence of the free variable x and are thus sentences. 
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The following lemma formalizes the semantics of substitutions. 

Lemma 2.7. Let a be a substitution and for any w = w\ . . .w n in S* let o" _1 (w) be 
the word u\...u n over the alphabet 2* with m = {4>j : (w, i) \= 4>j}. Then w \= o~{ip) iff 

(T _1 (w) (= 1p. 

The proof is straightforward and is omitted [TW04, TT05bJ. 

If r is a class of sentences and A is a class of formulas with one free variable we denote 
by r o A the class of sentences which are Boolean combinations of sentences in A and of 
sentences obtained by applying to a sentence ip of T a ^-substitution for some <I> C A. 
Substitutions provide a natural way to decompose complex sentences into simpler parts. 
For instance, the class of FO[<] sentences of quantifier depth k can be decomposed as 
the class of sentences of depth 1 in which label predicates are replaced with formulas of 
quantifier depth k — 1. 

We are most interested in the case above where T is a class of sentences although the 
definition of T o A can be naturally extended to the case where T is a class of formulas with 
one free variable. Under this more general setting the substitution operator is associative: 
if r, A, ^ are classes of formulas with at most one free variable, then ro(Aof ) = (roA)of . 

3. Regular Languages, Finite Monoids and the Block Product/Substitution 

Principle 

We give in the first half of this section a brief introduction to the algebraic theory 
of regular languages which is required for the sequel. A very thorough overview of the 
subject can be found in the survey of Pin |Pin97| or his earlier book [Pin86| . We also 
refer the interested reader to the survey of Weil which provides a shorter, more superficial 
introduction but considers more broadly the notion of algebraic recognizability for trees, 
infinite words, traces, pomsets and so on [Wei04j. In the section's second half, we state 
and prove the block-product/substitution principle which underlies many of the results 
presented in Sections 2] and [5j 

3.1. Regular Languages, Automata and Finite Monoids. 

A semigroup S is a set with a binary associative operation which we denote multiplica- 
tively. A monoid M is a semigroup with a distinguished identity element 1m- In the sequel, 
S and M always denote respectively a finite semigroup and a finite monoid. The set S + of 
finite non-empty words over £ forms a semigroup under concatenation (the free semigroup 
over X) while the set X* of finite words over £ is a monoid with identity e, the empty word. 

We say that M divides the monoid N and write M -< N if M is the homomorphic image 
of a submonoid of N. A class V of finite monoids forms a pseudovariety if it is closed under 
finite direct product, homomorphic images and formation of submonoids. In particular, the 
following classes all form pseudovarieties: 

• finite monoids M; 

• finite groups G; 

• finite solvable groups G so i; 

• finite Abelian groups Ab; 

• finite solvable monoids M so i, i.e. monoids whose subgroups are solvable. 
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A monoid M is said to be aperiodic or 'group free' if all its subgroups are trivial and 
we denote as A the pseudovariety of finite aperiodic monoids. The pseudovariety SL of 
semilattices consists of finite monoids which are idempotent (x 2 = x) and commutative 
(xy = yx) and it is easy to see that SL C A. 

An element e of M is idempotent if e 2 = e. For any finite monoid, there is always an 
integer u, the exponent of M such that x u is idempotent for all x E M. Pseudovarieties can 
often be conveniently described^! as the class of monoids satisfying a certain set of identities. 
For instance, the pseudovariety of groups G is the class of monoids satisfying x w y = yx^ = y 
(i.e. the only idempotent is the identity element of the group) and the pseudovariety A of 
aperiodics is defined by the identity = x u . 

We say that the language L C £* is recognized by M if there exists a homomorphism 
p : £* — > M and a subset F C M such that L = p^ 1 (F). A simple variant of Kleene's 
theorem states that a language is regular if and only if it can be recognized by a finite 
monoid. When one chooses to consider languages as subsets of £ + it is more natural to 
define recognition by finite semigroups and, for technical reasons, the algebraic theory of 
regular languages is slightly altered. The two parallel approaches coexist but cannot be 
completely reconciled despite their close relationship [Pin97]. For simplicity, we focus on 
the first case. 

The syntactic congruence of a language L C £* is defined by setting x =l y if and only 

if 

uxv E L 4^ uyv for all u,v E £*. 
The Myhill-Nerode theorem states that =l has finite index if and only if L is regular. The 
syntactic monoid M{L) of L is the quotient £*/ =l and is thus finite if and only if L is 
regular. It can be shown that M(L) recognizes L and divides any monoid also recognizing 
L. 

Example 3.1. 

Consider the language L = (ab)* . It is easy to see that for any word u containing two 
consecutive a or two consecutive b we have u L and, moreover, xuy L for any x, y E 
{a, b}* . Thus, any two such u are equivalent under the syntactic congruence and we denote 
the corresponding element of the syntactic monoid as since it will satisfy Om = mO = 
for all monoid elements m. Simple computation shows that the syntactic monoid of L is 
the six-element monoid B2 = {1, a, b, ab, ba, 0} where multiplication is specified by aba = a, 
bob = b, aa = bb = 0. It is often convenient to name elements of a syntactic monoid using 
words in £* that are minimal-length representatives for the different equivalence classes of 
the syntactic congruence. 

For u E £*, the right- quotient of L by u is Lu^ 1 = {x : xu E L} and the left-quotient is 
defined symmetrically. A class V of languages is a variety of language^ if it is closed under 
boolean operations, left and right quotients and under inverse homomorphisms between 
free monoids (i.e. if L E V and p : T* — > £* is a homomorphism then p _1 (L) E V). The 

^ In fact, every pseudovariety has a possibly infinite set of defining pseudo-identities (see e.g. |Pin97j for 
a formal treatment). 

We should note that we are bypassing a technical yet important detail in our definition of varieties of 
languages. Strictly speaking, a variety of languages should not be defined as a set of languages but rather 
as an operator which assigns to each finite alphabet a set of languages over that alphabet. While that 
distinction is occasionally important in technical proofs, we prefer the slightly less formal description given 
here since it simplifies the presentation. 
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very tight relationship existing between varieties of languages and pseudovarieties of finite 
monoids is the cornerstone of algebraic automata theory. 

Theorem 3.2 (Variety Theorem [Eil76] ). Th ere is a natural bijection between pseudovari- 
eties of finite monoids and varieties of languages: If*V is a pseudovariety of finite monoids, 
then the class L(V) of regular languages recognized by some monoid in V forms a language 
variety. 

Conversely, ifV is a variety of languages then the pseudovariety of monoids V generated 
by the syntactic monoids of languages in V is such that L(V) = V. 

One of the main objectives of algebraic automata theory is to explicitly relate natu- 
ral varieties of regular languages with their algebraic counterpart or, conversely, describe 
combinatorially the variety of regular languages corresponding to a given pseudovariety of 
monoids. An algebraic characterization of a variety of languages V provides a natural ap- 
proach for deciding if a given regular language L belongs to V: checking if K belongs to V 
is equivalent to deciding if its syntactic monoid M(K) belongs to the pseudovariety V such 
that L(V) = V. The latter formulation of the problem is often easier to handle. In par- 
ticular, all the pseudovarieties introduced thus far in this survey are such that determining 
membership of M(K) in V amounts to checking that the monoid satisfies some finite set of 
defining identities (e.g. x w+1 = x u for the pseudovariety A of aperiodics). This requires an 
amount of time polynomial in \M(K)\. Although, |M(iT)| is in general exponentially larger 
than the size of the representation of K, the problem of testing whether K G L(V) for a K 
specified by an automaton or a regular expression can be shown to lie in PSPACE for all 
pseudovarieties considered thus far. There are however pseudovarieties for which member- 
ship is undecidable and many problems in this line of work remain open [Alm94[ [Pin97j. 

The best-known instance of an algebraic characterization of a variety of languages is 
Schiitzenberger's theorem: 

Theorem 3.3 ([Sch65]). A regular language is star-free if and only if its syntactic monoid 
is group-free, i.e. L(A) = SJ-. 

The theorem of McNaughton and Papert |MP71| . whose proof we sketch in Section [4j 
further shows a language is star- free if and only if it is definable in FO[<]. 

Example 3.4. 

A simple calculation shows that the syntactic monoid of the language (ab)* , which we 
considered in Example 13.11 has exponent 2 and satisfies x 3 = x 2 . It is therefore aperiodic 
and so there must exist a star-free expression and an FO[<] sentence defining (ab)* . To 
construct a star-free expression, it suffices to note that (ab)* is the set of words starting with 
a, ending with b and having no consecutive a's or consecutive 6's. Since the complement of 
the empty set C is simply {a, 6}*, the following is a star-free expression defining (ab)*: 

a0 c n C 6 n (0 c aa0 c ) c n (0W) C 

The corresponding FO[<] sentence is 

VxVy [(Q b x -> [3z (z < x)]) A (Q a x -» [3z (x < z)])A 

([((x / y) A Q a x A Q a y) V ((x ^ y) A Q b x A Q b y)] -> (3z [(x < z < y) V (y < z < x)])) ] . 

The notion of recognition of a language by a monoid can naturally be extended to 
pointed languages: we say that the pointed language K C S* x £ x S* is recognized by M 
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if there are homomorphisms hi,h r : S* — > M and a set of triples T C (M x £ x M) such 
that 

K = {(w,p) : (hi(wi . ..w p -i),w p , h r (w p+ i . ..w\ w \)) £ T}. 

For a pseudovariety V we denote as P(V) the set of pointed languages recognized by a 
monoid in V. Abusing our terminology, it is convenient to think of ordinary words in E* 
as pointed words with p = and thus view L(V) as a subset of P(V). Note that P(V) is 
closed under boolean operations and inverse homomorphisms. 

While relating star-freeness, aperiodicity and FO-definability is far from trivial, there 
are cases in which such three-way equivalences are easy to obtain and the following lemma 
is particularly useful in inductive arguments. Let FOi[<] denote the class of first-order 
sentences with a single quantified variable and let FOFi [<] denote the class of first-order 
formulas with a single quantified variable and at most one free variable. We similarly 
denote MODi [<] and MODFi [<] the analog classes when ordinary quantifier quantifiers 
are replaced with modular ones. 

Recall that SL and Ab respectively denote the pseudovarieties of semilattices and 
Abelian groups. 

Lemma 3.5. 

(1) L(SL) is the Boolean algebra generated by languages of the form S*aS* where £ 
is a finite alphabet and a £ S. Furthermore L(SL) = L(FOi[<]) and P(SL) = 
P(FOFi[<]). 

(2) L(Ab) is the Boolean algebra generated by languages of the form {w : |iu| a = i 
(mod m)} with i,m <G N. Furthermore L(Ab) = L(MODi[<]) and P(Ab) = 
P(MODFi[<]). 

Proof sketch. The syntactic monoid of the language S*aS* of words containing an a is 
the two-element semilattice {1,0} with multiplication given by xO = Ox = 0. Thus, any 
boolean combination of such languages can be recognized by a direct product of copies of 
this semilattice. 

Conversely, if M is a semilattice and p : S* — > M is a homomorphism, then by commu- 
tativity and idempotency, the value of p(w) only depends on the set of letters occurring in 
w. Thus, if F C M then p^ 1 (F) is in the boolean algebra generated by the E*aS*. 

The language S*aS* can be defined by the sentence 3xQ a x and any FOi sentence is a 
boolean combination of sentences of that form and therefore L(SL) = L(FOi[<]). Similarly, 
FOFi[<] formulas with free variable y and bound variable x are boolean combinations 
of sentences of the form 3x [(x * y) A Q a x] where * G {<,>,=} and one can conclude 
P(SL) CP(FOFi[<]). 

The case of Abelian groups is handled similarly: one can show that the variety of 
languages L(Ab) consists of languages L such that membership of a word w in L only 
depends on the number of occurrences of each letter in w modulo some integer m. □ 

The above lemma might give the impression that whenever V is a pseudovariety such 
that the class of languages L(V) has a meaningful logical description, then the class of 
pointed languages P(V) also has a meaningful (and closely related) logical description. 
This is unfortunately not the case and, in fact, there are very few classes A of formulas 
with one free variable whose expressive power can be characterized algebraically as P(V) 
for some pseudovariety V. 
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3.2. Block-Products and Substitutions. 

Let M and iV be finite monoids. To distinguish the operation of M and N, we denote 
the operation of M as + and its identity element as 0, although this operation is not 
necessarily commutative. A left-action of iV on M is a function mapping pairs (n, m) £ 
N x M to nm £ M and satisfying n{m\ + 771-2) = nm\ + nm,2, 711(772777) = (771712)777, 
77O = and lm = m. Given a left-action of N on M, the semidirect product MxiJV (with 
respect to this action) is the monoid with elements in M x N and multiplication defined 
as (mi, m) (7712,712) = (mi + 77,1777.2,77.177.2). It can be verified that this operation is indeed 
associative and that (0, 1) acts as the identity element. 

Right actions are defined symmetrically and naturally lead to the notion of reverse 
semidirect products. If we have both a right and a left-action of N on M that further satisfy 
77,1(777.71-2) = (77,1771)712, we define the bilateral semidirect product M**N as the monoid with 
elements inMxJV and multiplication defined as (mi, 77i)(m2, 712) = (mi?7,2 + 7117712,711712). 
This operation is associative and (0, 1) acts as an identity for it. Semidirect products (resp. 
reverse semidirect) can then be viewed as the special case of bilateral semidirect products for 
which the right (resp. left) action on M is trivial. The block product of the pseudovarieties 
V, W, denoted V □ W is the pseudovariety generated by all bilateral semidirect products 
M**N with MeV,JVe W. 

(Bilateral) semidirect products are useful to decompose finite monoids of potentially 
complex structure into simpler components. For instance, it is well known that every finite 
group is isomorphic to an iterated semidirect product G\ x (G2 x (• • • (Gk-i x Gk) ■ ■ •)) 
where each G\ is a simple group and that a group is solvable if and only if there is such 
a decomposition in which all Gi are cyclic groups of prime order. For monoids which 
are not groups, the Krohn-Rhodes theorem [KR65] states that every finite monoid divides 
an iterated semidirect product (bracketed as above) where every term is either a simple 
group or the 'set/reset monoid' (or 'flip-flop') i.e. the three element monoid {l,s,r} with 
multiplication satisfying lx = xl = x, xs = s and xr = r for each x. The bilateral 
semidirect product allows decompositions with even simpler factors: every finite monoid 
divides an iterated bilateral semidirect product of the form 

Mi**(M 2 **(M 3 **(. . . M k _i**M k ))) 

where each Mi is either a simple group or the two element semilattice [RT89]. 

Let Vo be the trivial pseudovariety (containing only the trivial monoid) and for i > 
define inductively the pseudovarieties V 2 i+i = G □ V 2 i and V 2 i+2 = SL □ V 2 i +1 . The last 
result stated in the previous paragraph implies in particular that the pseudovariety M of all 
finite monoids is the union of the V; or, equivalently, that M is the smallest pseudovariety 
W satisfying G □ W = W and SL □ W = W. The following theorem lists fundamental 
results that similarly decompose important pseudovarieties in terms of block-products. All 
results are either due to Rhodes and Tilson or can be inferred from their work [RT89J. 

Theorem 3.6. 

(1) The pseudovariety A of aperiodic monoids is the smallest pseudovariety satisfying 
SLnA = A. 

(2) The pseudovariety G so \ of solvable groups is the smallest pseudovariety satisfying 
Ab □ G so i = G so i. 

(3) The pseudovariety M so i of solvable monoids is the smallest pseudovariety satisfying 
Ab □ M sol = M sol and SL □ M sol = M sol . 
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The block product operation on pseudovarieties is not associative: it can be shown that 
(U □ V) □ W C U □ (V □ W) but this inclusion is strict in general. The block product 
is mostly used as a means of decomposing large and complex pseudovarieties into smaller, 
simpler ones and the most classical applications of iterated block-products have relied on 
the stronger right-to-left bracketing. Theorem 13 . 6 1 for instance states that the pseudovariety 
of aperiodics is the union of all pseudovarieties of the form 

SL n (SL □(...□ (SL □ SL) .. .))■ 

In Section [5] we show the relevance of the weaker left-to-right bracketing of iterated block- 
products when analyzing the expressive power of two-variable sentences. 

The languages recognized by V □ W can be conveniently described in terms of lan- 
guages recognized by V and W. For a monoid N G W, an N -transduction r is a function 
determined by two homomorphisms hi , h r : X* — > N and mapping words in X* to words in 
(N x X x N)* . For a word w = w± . . . w n G X* we set 

T(u>) = t(wi)t(w 2 ) ■ ■ ■ T(w n ) 

with 

r{wi) = {hi(wx ■ ■ ■ Wi-x),Wi, h r (w i+ i . . . w n )) . 
For a language K C (N x X x N)*, let ^(K) = {w G X* : t(w) G K}. 

Theorem 3.7. [Str94l lPin97] A regular language lies in L(VnW) iff it is the Boolean com- 
bination of languages in L(W) and languages t~ 1 (K) for some K G V and N -transduction 
t with NeW. 

Proof sketch. The proof is too technical to present in full detail but we give a brief 
overview of the main idea for completeness. The argument relies on the very definition of 
multiplication in the bilateral semidirect product M**N. Recall that (mi, nx)(rn2, n-z) = 
{m\U2 + n\mi,n\rL-2) and so, by extension, if (mi,ni), (rri2,n2), ■ ■ ■ , (mt,nt) are elements 
of M**N then their product (m\,n\) (mt,nt) in M**N is given by 

(min 2 n 3 ...n t + n 1 m 2 n 3 . . . n t + ■ ■ ■ + n\ . . . n i _ 2 m t _in t + n\... n t -im t , n 1 . ..n t ). 

Fix an element in (m,n) € M**N and consider the language Ei m ^ n \ C (M**N)* 
consisting of finite sequences (mi, ni), . . . , (mt,nt) of elements of M**N that multiply out 
to (m, n). Similarly, let Ef m ^\ be the union over all n of all Er m ^ n \ and let Eu n \ be the union 
over all m of the Er m n y. we thus have Er mtTl ) = E( m ^\ r\Eu >n \. Finally denote by E m C M* 
the set of words of M* that multiply out to m in M. Let r be the iV-transduction which 
maps w G (M**N)* to t(w) = t(w\) . . . t{w\ w \) with r(wi) = (m . . . ni_i,mi,ni + i . . . nt)- 
If we identify these triples with the element n\ . . . ni_imjnj+i . . . nt of M then by the above 
expression we obtain immediately that Er m #\ is r~ 1 (E m ). Note that if M G V then E m G 
L(V). On the other hand it is easy to see that if N G W then Eu lV ,) G L(W). This argument 
in fact suffices to establish that every language in L(V □ W) is a Boolean combination of 
languages in L(W) and languages T~ 1 (if) for some K G V and 7V-transduction r with 
N G W. 

The converse statement, while more involved technically, proceeds along the same lines. 

□ 
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There is a striking similarity between the notion of transduction and that of substi- 
tution discussed in the previous section. We formalize this crucial correspondence in the 
next lemma which we refer too as the block-product/ substitution principle. Therien and 
Wilke [TW04J were the first to use this specific terminology although it is fair to say that 
the idea was implicitly present in the work of Straubing [Str94]. In [TW04j, the lemma is 
stated for temporal logics. The formulation given here is taken from [TT 05b]. 

Lemma 3.8 (Block-product/subsitution principle). 

LetT be a class o/FO + MOD[<] sentences and A a class o/FO+MOD[<] formulas with 
one free variable. ZfV, W are pseudovarieties of finite monoids such that L(T) = L(V) and 
P(A) = P(W), then L(T o A) = L(VnW). 

Proof. Since L(V □ W) is closed under Boolean combinations, the left-to-right containment 
follows if we show that for any ip G T and any A-substitution a we have ^-> a {i>) £ L(V □ W). 
Let w be some word in X* and $ = {(pi, . . . 4>k} be the formulas used by a. Since 
P(W) = P(A), the pointed languages P^, : {(w,i) : (w,i) (= can be recognized by 
monoids Ni , . . . , Nk in W and JV = JVi x . . . x Nk recognizes any Boolean combination of 
them. This implies the existence of two morphisms hi,h r : T,* — > N such that the mem- 
bership of a pointed word (w, i) in each Nj can be determined by the value of the triple 
(hi(wi . . . Wi-i),Wi, h r (wi + i . . . w n )). Using these two homomorphisms, we therefore obtain 
an Af -transduction r such that for each i, the value of t(w{) is sufficient to determine the 
set {4>j : (w,i) (= <f>j}. Since we assume that L^ is recognized by a monoid M in V, we 
get that Lq.^) = t _1 (A') for some K C (N x S x N)* also recognized by M. Hence, by 
Theorem L a(ip) G L(V □ W). 

For the right-to-left containment, we need to show that any language of L(V □ W) can 
be described by a sentence of T o A and we proceed similarly. If r is an A-transduction for 
some N G W then for any triple (ni, a, 712) G N x S x N, the pointed language 

r (m,a,n 2 ) = i( w ^) '■ T ( w i) = (ni,a,n 2 )} 
is in P(W) and is thus definable by some formula (f>f nita>n2 ) in P(A). Consider the substitu- 
tion a defined by these 4> ni ,a,n 2 and note that for any w G S* and any position 1 < i < \w\, 
exactly one of the 0( m , a ,n 2 ) is * r ue at i. Hence a~ 1 (wi) = {4> {ni ^ n2) \(w,i) (= ^( ni)0 , n2 )} is 
always a singleton and the range of possible values can be identified with the set N xS x N . 
Any language K C (N x S x N)* in L(V) is definable by some sentence ipK G T. Now 
the set of words such that t(w) G K is defined by the sentence obtained from by a c 
substitution. □ 

Many results giving algebraic characterizations of regular languages defined in a frag- 
ment A of FO + MOD[<] more or less explicitly rely on some form of this lemma. It is 
often rather easy to characterize algebraically the expressivity of very weak fragments of 
FO + MOD[<] (e.g. Lemma [3. 5p . Furthermore, sufficiently robust classes A can typically be 
decomposed through iterated substitutions of these weak fragments. Applying the block- 
product/substitution principle we are thus able to characterize the expressive power of A 
by analyzing an iterated block-product. 

For a number of reasons, however, this general paradigm cannot be applied too generally. 

• Straubing showed that the class of regular languages definable in the most natural 
fragments of MSO[<] (in particular, fragments of first-order defined by quantifier 
type, quantifier alternation, quantifier depth, number of variables and so on) are all 
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C; m -varieties of languages (see |Str02] for a formal definition) and are varieties of lan- 
guages in the sense we denned earlier when we consider subclasses of FO + MOD[<]. 
Thus the expressive power of these fragments have some algebraic characterization. 
Preliminary investigations unfortunately indicate that classes of pointed languages 
definable in similar fragments only rarely correspond to P(V) for some pseudovari- 
ety V, as we noted after Lemma 13.51 This state of affairs limits the possible range 
of applications of the block-product /substitution principle. 
• We mentioned in Section [2] that the substitution operator is associative and the 
block-product/subsitution principle might lead one to find this fact in apparent 
contradiction with the non-associativity of the block-product. If T is a class of 
sentences and A, <I> are classes of formulas with at most one free variable, then indeed 
we have r o (A o <£) = (r o A) o Suppose that U, V, W are pseudovarieties such 
that L(r) = L(U), P(A) = P(V) and P($) = P(W) then the principle insures us 
first that L(ro A) = L(U □ V) and, with a second application, that L(ro (Ao<3>)) = 
L((U □ V) □ W). However, we cannot in general infer P(A o <£) = P(V □ W). 

4. Classical Results from the Block-Product/Subsitution Principle 

4.1. Quantifier Depth. 

Let us first see how the block-product/subsitution principle can provide a proof of 
McNaughton and Papert's characterization of FO[<] and Straubing, Therien and Thomas' 
characterization of FO+MOD[<]. For a sentence r ip, and a variable x not occurring in 
V>, let tp[ <x ] and tp\ >x \ respectively denote the formulas obtained from ip by restricting the 
scope of any quantified variable of ip to values respectively strictly less than x and strictly 
greater than x. We rely on the following lemma. 

Lemma 4.1 ([STT95J). Any FO+MOD[<] formula 4>(x) with a single free variable x can 
be rewritten as a boolean combination of formulas of the form Q a x A P[ <x ] A X[>x] suc -h that 
the formulas p and \ have the same quantifier signature as <f> i.e. their quantifier depths 
are equal and the order in which different types of quantifier types are nested ( existential, 
universal, mod m counting) are the same. 

The proof is not conceptually difficult. By renaming variables we can assume that 
<f>(x) does not contain any bound occurrence of x and this is the starting point of the 
construction. This rather trivial observation does not necessarily hold, however, when the 
sentences considered are only allowed a bounded number of variables as in Section [5j 

Let us first focus on the problem of definability in FO[<]. Let SD k and FD k respec- 
tively denote the class of FO[<] sentences of quantifier depth k and FO[<] formulas of 
quantifier depth k with at most one free variable. By definition of quantifier depth, we have 
SD k+1 = SD 1 o FD k . 

Theorem 4.2. Let Vi = SL and Vk + i = SL □ V^. A regular language L can be defined 
by an FO[<] sentence of quantifier depth k if and only if its syntactic monoid M lies in 
V k . In other words, L(SD k ) = L(V k ). 

Proof sketch. We argue by induction on k. The base case is provided by Lemma 13.51 
For the induction step we use the fact that SD k+1 = SD 1 o FD k . Since we know that 
L(SD 1 ) = L(SL) our inductive claim follows from the block-product/substitution principle 
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if we can show that P(FD k ) = P(Vk). This is precisely what Lemma 14.11 allows: any 
formula (j)(x) £ FD k can be rewritten as a boolean combination of formulas of the form 
Q a x A P[ <x ] A X[>x] where p and x are sentences of SD k . □ 

Membership of M in any individual is trivially decidable because these pseudovari- 
eties are effectively locally finite [Pin97]: for each t, k we can effectively construct a monoid 
k such that any monoid M € with at most t generators is a divisor of M t ^. On the 
logical side, this is essentially equivalent to noting that over a given alphabet there are only 
finitely many equivalent first-order sentences of any fixed quantifier depth. The decidability 
of individual is in itself of moderate interest since one is typically not so interested in 
determining whether L is definable in some specific quantifier depth but rather whether L is 
FO[<]-definable at all. In algebraic terms, we are more interested in deciding membership 
of M in the union of the than in some specific V^. By part (1) of Theorem 13.61 we know 
that 1J Vk = A and this provides the missing link to obtain the theorem of McNaughton 
and Papert which we combine with Schiitzenberger's Theorem to obtain: 

Corollary 4.3. L(FO[<]) = L(A) = SF. 

Thus, deciding if a language K is FO[<] definable is equivalent to testing if if's syntactic 
monoid is aperiodic. The latter problem is clearly decidable and is in fact PSPACE-complete 
when K is specified by a finite automaton |CH91j . 

The same proof methods also yield an algebraic characterization of languages definable 
in FO + MOD[<] and MOD[<]. 

Theorem 4.4. A language L is FO +MOD[<]- definable if and only if its syntactic monoid 
M is solvable. Furthermore, L is ~NLOT)[<]- definable if and only if M is a solvable group. 

Proof sketch. By part (2) of Lemma [3.51 we have L(Ab) = L(MODi[<]) as well as 
P(Ab) = P(MODFi[<]). Using the inductive argument of Theorem 14.21 we get that the 
class of languages definable in MOD[<] (resp. FO+MOD[<]) are those with syntactic 
monoids in the smallest pseudovariety V satisfying Ab □ V = V (resp. Ab □ V = V and 
SLnV = V). Theorem 13.61 completes the argument. □ 

The theorem immediately provides an algorithm to decide expressibility in these two 
logics. The result can be specialized to characterize the expressive power of FO + MOD[<] 
and of MOD[<] sentences when the modular quantifiers are restricted to specific mod- 
uli [HEMl EtHSl ETT95] . 

4.2. Quantifier Alternation. 

Quantifier depth is only one of many possible parameterizations of languages definable 
in FO[<]. In particular it is natural to consider the hierarchy of first-order sentences 
defined by quantifier alternation. The block-product/substitution principle seems to be 
of no use in that case but the question can nevertheless be studied with algebraic and 
combinatorial perspectives. There is a natural parametrization of star-free languages in 
terms of concatenation depth. The Straubing-Therien hierarchy is defined inductively as 
follows: a language over X* has depth if and only if it is or E*. Level k + 1/2 of 
the hierarchy consists of unions of languages of the form LiQa\L\a<i . . . atLt where the 
are letters of £ and the Lj's are languages of depth k. Finally the {k + l)st level of the 
hierarchy is the boolean closure of the level k + 1/2. This hierarchy is closely related 
to the Brzozowski-Cohen or dot-depth hierarchy [CB71| (the precise correspondence was 
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established by Straubing [Str85l IPin97| ). The Straubing-Therien hierarchy is known to 
be infinite and the union of all its levels clearly corresponds to the class SF of star-free 
languages. 

As usual, let £&[<] and IIfc[<] denote the subclasses of FO[<] sentences defined by 
quantifier alternation. 

Theorem 4.5 ( |Tho8~2} IPP86] ) . A language L is definable in if and only if L belongs 

to level k + 1/2 of the Straubing-Therien hierarchy. 

In fact, the original result of Thomas [Tho82] relates the levels of the Brzozowski-Cohen 
dot-depth hierarchy with definability in £&[<, S] but the argument can be easily be adapted 
to obtain the theorem just stated [PP86]. The 'if part of the theorem is immediate from the 
definition of the Straubing-Therien hierarchy. Thomas' argument for the second half of the 
theorem does not involve any algebra and relies instead on Ehrenfeucht-Frai'sse games. In 
particular it provides a way to relate FO[<]-definability and star-freeness without resorting 
to algebra (this is also true of [MP71] ). 

It is not hard to show that the kth levels of the Straubing-Therien hierarchy are closed 
under inverse homomorphic images, left and right quotients, union and complementation 
and thus form varieties of languages. The variety theorem therefore guarantees that these 
classes correspond to some pseudovariety of finite monoids. Note in contrast that the k+1/2 
levels do not form varieties of languages since they are not closed under complementation. 
They still are closed under inverse homomorphic images, quotients, union and intersection 
and therefore form what are known as positive varieties of languages. These can also be 
analyzed from an algebraic perspective using ordered syntactic monoids and pseudovarieties 
of ordered monoids [Pin86, Pin97]. The decidability of levels 1/2 and 1 of the Straubing- 
Therien hierarchy follow from Simon's theorem on piecewise-testable languages |Sim75j and 
later refinements [Pin95| [Pin97| . Level 3/2 is also decidable but considerable work is needed 
to establish this deep fact [PW97J (sec [GSOO] for an independent proof of the decidability 
of level 3/2 of the dot-depth hierarchy) and the decidability of level 2 is one of the most 
important open problems in algebraic automata theory [GS01, Pi n97[ lPS81j IPW97, P WOH 
ISt^lSW92llWei89j . 

There is in fact a general lesson to be learned from Theorem 14.51 We argued in the 
first half of this section that when $ is a class of sentences and V is a pseudovariety such 
that L(<I>) = L(V) then the class T of sentences which are boolean combinations of sen- 
tences of the form 3x [Q a x A ip\< x ] A X[>x]] with rp, x £ $ is such that L(r) = L(SL □ V). 
Clearly, the languages in L(r) are boolean combinations of languages of the form L\aL2 
with Li,L2 € L (<!?)• These facts provide us with a bridge linking, under the correct tech- 
nical assumptions, the logical operation of adding an extra existential quantifier, the al- 
gebraic operation of forming a block-product SL □ V and the combinatorial operation of 
concatenation of two languages. The same idea can be extended to obtain combinato- 
rial and algebraic counterparts to the addition of a whole block of existential quantifiers 
3x\ . . . 3xk (f>(x%, . . . , Xk) on the logical side. 

For a variety of languages V, we define Pol(V) to be the clas^l of languages which are 
unions of languages of the form L^aiLi . . . a^L^ for Lj € V. One can put in correspondence 
the logical operation of adding a block of existential quantification with the combinatorial 
operator of polynomial closure Pol(V) on varieties of languages. In other words, under some 



Note that in general Pol(V) is not a variety of languages because it need not be closed under complement. 
It does however form a positive variety in the sense of |Pin861 IPin97] 
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technical conditions, one can show that if $ is a class of sentences and V is a language 
variety such that L(3>) = V then a language K belongs to Pol(V) if and only if it can be 
defined as a positive boolean combination of sentences of the form 

3xi .. . 3x fc [( Xl < ... <x k ) A Q ai X! A ... A Q ak x k A tf <Xi] A tf >Xu<X2] A ... A Vf >Xfe] ] 

where each ib^ is a formula in $ and where the subscript ibl , , is the formula obtained 

from by restricting the quantified variables to lie between xj and Xj + \. 

In turn the operator Pol(V) on varieties of languages is linked to an algebraic operation 
on pseudovarieties of monoids defined in terms of so-called Mal'cev products [PW97]. Sim- 
ilarly, the addition of a block of modular quantifiers is related to the closure of a variety of 
languages under products with counters which can also be described algebraically through 
Mal'cev products |Wei92j . 

4.3. Sentences with Regular Predicates. 

The numerical predicate successor (S) is definable in FO[<] and so the expressive power 
of FO[<,5] (resp. FO+MOD[<,5]) is exactly that of FO[<] (resp. FO + MOD[<]). 

The cases of MOD[<,5] and of the different Sfc[<,5], however, are more subtle: the 
algebraic characterization of languages definable in these fragments [Str94l ISTT95| would 
require the introduction of the notions of syntactic semigroup, semigroup pseudovarieties 
and H — varieties of languages which we chose to omit. Still, the fundamental tools of the 
analysis are conceptually very similar to the ones we presented in this section. 

If successor is the only available numerical predicate, then the expressive power of 
first-order sentences is dramatically reduced. Thomas and later Straubing gave combinato- 
rial and algebraic descriptions (the latter, again, in terms of syntactic semigroups) of the 
languages definable in FOfS 1 ] and showed that these form a strict subclass of the star- free 
regular languages |Str94l ITho82l ITho97] . The work of Therien and Weiss |TW85j establishes 
the decidability of this class. The cases MOD [5] and FO+MOD[5] are also investigated 
in Chapter VI of Straubing's book [Str94j . 

The extra expressive power afforded by the unary predicate =i, m x (which is true at x 
if x = i (mod to)) has also been considered [CPS06, Str02]. More generally, a numerical 
predicate R C N* is said to be regular if it is definable in FO + MOD[<]. Equivalently, R 
is regular if it is definable in FO[<, {=j m }] (see |Pel92j ) and the terminology comes from 
yet another equivalent definition of regular predicates using finite automata [Str94]. Let 
Reg denote the class of regular numerical predicates: it follows from our definition that 
FO + MOD [iieg] C MSO[<] so this class consists only of regular languages and in fact 
regular predicates form the largest class of numerical predicates with this property [Pel92j . 
By definition, FO + MOD[i?eg] = FO+MOD[<] and the expressive power of the fragments 
FO[Reg], M OY>[Reg] (among others) can be characterized algebraically [BCST921 ISTTMl 
ISfaMlPel92] . 

5. Two- Variable Sentences and Temporal Logic 

In the previous section, the application of the block-product/substitution principle was 
particularly fruitful because of the decomposition of the pseudovarieties A,G so i,M so i in 
terms of iterated block products of semilattices and Abelian groups (Theorem 13. 6p . As 
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we noted these iterated block products use the strong, right-to-left bracketing whereas the 
present section relies on decompositions using the weaker left-to-right bracketing. 

5.1. Sentences with a Bounded Number of Variables. 

It is common practice to construct logical sentences in such a way that any subformula 
<f>(x) with a free variable x never contains an occurrence of x which is bound by a quantifier. 
This certainly avoids possible confusions although it is quite possible to construct sentences 
that do not obey this rule and stillget unambiguous semantics by interpreting a variable 
as bound by the previous quantified. We illustrate this in the following two examples: 

Example 5.1. 

The three variable sentence of Example 12.51 

3xVy [Q a xA[(y < x) => ^Q a y] A 3 0mod2 z [(x < z) A Q c z] j 
can clearly be rewritten as the two- variable sentence 

BxVy [Q a x A [(y < x) ^Q a y] A 3 0mod2 y [(x < y) A Q c y] ) . 

In many cases, the rewriting is not as trivial. 
Example 5.2. 

We claim that the following FO[<] sentence can also be rewritten using only two variables. 

3xiy3z [Q a x A [(x < y) =► ^Q a y] A Q d z A (x < z) A \{x < y < z) Q c y] J . 

This sentence is true for words over £ = {a, b, c, d} in which there exists a position x that 
holds the last occurrence of a and whose suffix begins with some c's (possibly none) followed 
by a d. Thus the sentence defines the language T,*ac*d{b, c, d}* . 

We claim that the following two-variable sentence defines the very same language. 

3x \Q a x A [Vy {{x <y)=> ^Q a y)}/\ 

3y [(x < y) A Q d y A Vx [((x < y) A ^Q c x) (3y [(x < y) A Q a y})]] ] . 

The first part of this second sentence also identifies x as the location of the last a. To 
understand how the rest of the sentence imposes the condition on the suffix of this position, 
it is more convenient to look first at the meaning of the most deeply nested subformulas 
and work back towards the outermost quantifiers: the most deeply nested subformula 

4(x) : 3y [(x < y) A Q a y)] 

with free variable x is true at position x if there is an a occurring at x or a later position. 
Now, 

ifj(y) : Vx [((x < y) A ^Q c x) (3y [(x < y) A Q a y])] 
which has y as a free variable is true at position y if all positions x before y that do not 
hold a c satisfy the property (j){x). Finally, 

rj(x) : 3y [(x < y) A Q d y A Vx [((x < y) A ^Q c x) => (3y [(x < y) A Q a y])]] 

checks that there is a y > x holding d and satisfying ip{y). Putting it all together, we see 
that if x holds the last a then it satisfies rj(x) iff its suffix lies in c*d{b, c, d}* . Indeed, any 



A more formal discussion is given in ST03 
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y occurring after the last a satisfies ift(y) if and only if all positions between that last a and 
y hold a c. 

We denote as FO k [A/], MOD fc [N] and FO+MOD fc [A/] the different classes of first- 
order sentences constructed with at most k distinct variables. 

5.1.1. The First-Order Case. 

Kamp showed that a language is starfree if and only if it can be defined in LTL (linear 
temporal logic) [Kam68j . We formally describe this logic in the next subsection and show 
how an LTL formula can easily be translated into an equivalent FOs[<] sentence. Thus 

Theorem 5.3. L(FO[<]) = L(F0 3 [<]) = L(LTL) = L(A) = SF 

Lemma [3.51 provides us with a characterization of the expressive power of FOi[<]. The 
case of two-variable sentences was first studied by Etessami, Vardi and Wilke who showed 
that a language is definable in FC>2[<] if and only if it can be defined in unary temporal 
logic, i.e. by an LTL sentence using only unary temporal operators [EVWQ2] • The problem 
of deciding whether a language was definable in this logic was later settled through the 
algebraic characterization of this class, given by Therien and Wilke [TW98J. 

Let us quickly review the mechanics of our proofs in Section [H We decompose sentences 
of quantifier depth k + 1 as images of sentences of depth 1 under a substitution of formulas 
of quantifier depth k. Since by Lemma 14.11 any formula (ft{x) of quantifier depth k can be 
written as boolean combinations of formulas of the form Q a x /\p[ <x ] f\X[>x] we can conclude 
that the pointed languages definable by such formulas are exactly the pointed languages in 
P(Vic) and this makes our inductive proof possible. 

In the case of two- variable sentences, we cannot hope to find an analog of Lemma [4. 11 if 
p is a sentence using only two variables x, y it is not possible to construct the relativization 
P[ <x ] without introducing new variables. To circumvent this problem we choose to decom- 
pose two-variable sentences of depth k + 1 as the images of sentences of depth k under a 
substitution of formulas of quantifier depth 1. 

When considering substitutions in the two-variable context, we need to worry about 
preserving the two- variable property. In other words, if A is a class of two- variable sentences 
and r is a class of two variable formulas with at most one free variable, we denote as AoT the 
class of sentences which are boolean combinations of sentences in T and sentences obtained 
from a A sentence by replacing each occurrence of a predicate Q a x (resp. Q a y) by a formula 
4> a (x) of A (resp. <ft a (y)). The block-product /substitution principle still holds true under 
this restricted notion of substitutions [TT05b] . 

While we analyzed FO[<] sentences by starting from the outermost quantifiers it is 
much more convenient to begin our study of a two- variable F02[<] sentence eft by looking 
at an innermost quantifier. Indeed, since (ft uses only two variables, its most deeply nested 
subformula containing a quantifier is always of the form By tft(x,y) or 3x ip(x,y), where 
ip(x,y) is quantifier-free. We therefore isolate the FOFi[<] subformulas of (ft which are 
boolean combinations of formulas of the form 3y [(x * y) A Q a y] for * G {<,>,=} and 
formulas of this form with the roles of x and y reversed. 

Let Q2,k denote the class of FC>2[<] sentences of quantifier depth at most k. From the 
observations of the previous paragraph we have Q2,k+i = Q2,k ° FOFi[<] and one obtains 

Lemma 5.4. Let Wi = SL and Wi+i = Wj nSL for each i > 1. Then for each k > 1 we 
have L(W k ) = L(Q 2 , k ). 
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Proof. The proof is a straightforward induction. The base case 

L(Wi) = L(SL) = L(FOi[<]) = L(Q a>1 ) 

is given by Lemma 13.51 

For the induction step, assume L(W k ) = L(Q2,k)- We know by Lemma 13.51 that 
P(SL) = P(FOFi[<]) and by the block-product/substitution principle 

L(Qa,k+i) = L(Q 2 ,k o FOFi[<]) = L(W k □ SL) = L(W k+1 ). □ 

Thus, a language L is definable by an F02[<] sentence if and only if its syntactic 
monoid M belongs to one of the pseudovarieties 

W k = (. . . ((SL □ SL) □ SL) □ . . . SL). 

k times 

Note that this iterated block product uses the weaker left-to-right bracketing. The union 
of the W k is the smallest pseudovariety W satisfying W □ SL = W. Let DA denote the 
pseudovariety of monoids satisfying (xy) UJ y(xy) UJ = (xy) w . 

Theorem 5.5 QST02]). The pseudovariety DA is the smallest satisfying DAdSL = DA. 

Combining this result with Lemma 15.41 we get the following theorem of Therien and 
Wilke |TW98j : 

Corollary 5.6. L(F0 2 [<]) = L(DA). 

This immediately provides an algorithm for deciding if a regular language is definable by 
an FO2 [<] sentence because the pseudovariety DA is decidable. This pseudovariety admits 
a number of interesting characterizations [TT02] and, in particular, the regular languages 
whose syntactic monoids lie in DA have a nice combinatorial description. In fact, the origi- 
nal proof of Corollary 15.61 relied upon this characterization rather than on the decomposition 
of DA in terms of weakly iterated block products. For regular languages Lq, . . . , C X* 
and letters a%, . . . , G S, we say that the concatenation L = LqO\L\ . . . a^L^ is unambigu- 
ous if for each w G L there exists a unique factorization of w as w = woa\Wi . . . atWk with 
Wi G U. 

Theorem 5.7 ([Sch76]). A language LCS* has its syntactic monoid in DA if and only if 
L is the disjoint union of unambiguous concatenations of the form SqOiS^ . . . a^S^, where 
a,i G £ and Ej C S. 

Furthermore, Pin and Weil show that L lies in L(DA) if and only if both L and its 
complement lie in the second level of the Straubing-Therien hierarchy. Thus, L is definable 
in FC>2[<] if and only if it is definable in both S 2 [<] and in Il2[<]. 

Theorem 5.8 ( |TW98j ). F0 2 [<] = E 2 [<] nn 2 [<]. 

Example 5.9. 

In Example 15 .2^ we gave two first-order sentences defining the language L = T,*ac*d{b, c, d}* , 
the second of which was FC>2[<]. Note first that this concatenation is unambiguous: if a 
word w belongs to L then there is a unique factorization w = w$awidw2 such that w\ G c* 
and u>2 G {b, c, d}* because w\ must start right after the last occurrence of the letter a in 
w and must end at the first occurrence of d after this a. Hence, the syntactic monoid of L 
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lies in DA. We can also define L using the following £2[<] sentence which simply reflects 
the structure of the regular expression for L: 

3x3yVz 

(x < y) A Q a x A Q d y A [(a; < z < y) -» Q c z] A [(z > y) -> {Q b z V Q c z V Q d z)} J 
But the following II2 [<] sentence also defines L: 
\/xiyiz3s3t3u ^Q a t A QdU A 

[{(x < y) A Q a x A -» ([(x < z < y) -» Q c z] V ((a; < s) A Q a s) V ((x < s < y) A Q d s))] 

Indeed, this sentence relies on the fact that a word belongs to L if it contains at least one a, 
contains at least one d and is such that for any position x holding a and any later y holding 
d either all positions between x and y hold c or there exists an a occurring later than x or 
a d occurring between x and y. 

Example 5.10. 

We gave in Example 12. II a S2[<] sentence defining the language K = {a, b, c}*ac*a{a, b, c}* . 
Elementary computations can show that the syntactic monoid U of K consists of the six 
elements {1, a, b, ab, ba, 0} with multiplication specified^ by aa = 0, bb = b, aba = a, bob = b 
and On = uO = for all u € U. In particular, ba is idempotent and if x = b and y = a, we 
have 

{ba^aibaY = baaba = ^ {bay. 
Thus U does not belong to DA and K cannot be defined by a IT2[<] sentence or by an 
FC>2[<] sentence. 

5.1.2. Two-Variable Sentences with Modular Quantifiers. 

To characterize the expressive power of MOD2[<] and FO + MOD2[<] sentences, we 
can precisely follow the proof paradigm used in the FC>2[<] case above. Since we have 
L(MODi[<]) = L(Ab) and P(MODi[<]) = P(Ab) we are naturally led to consider the 
smallest pseudovariety V such that Vo Ab = V (for the MOD2[<] case) and the smallest 
pseudovariety W such that WdAb = W and WnSL = W (for the FO + MOD 2 [<] case). 

Theorem 5.11 ([ST02]). The pseudovariety G so i is the smallest pseudovariety satisfying 
Ggoi □ Ab = G S oi- 

The pseudovariety DAnG so i is the smallest pseudovariety satisfying (DAnG so i)nAb = 
DA □ G S oi and (DA □ G so i) □ SL = DA □ G so i. 

This theorem yields 

Corollary 5.12 QST03J). A language L is definable in FO+MOD2[<] if and only if its 
syntactic monoid M(L) lies in DA □ G so i and is furthermore definable in MOD2[<] if 
M(L) is a solvable group. 



Note that despite the similarity, the monoid U is not isomorphic to the syntactic monoid B2 of (ab)* 
because bb = b in U and bb = in i?2- 
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Let us denote as X2 MODF[<] the class of FO + MOD[<] sentences which, as the 
terminology suggests, are positiv^l boolean combinations of sentences obtained by applying 
to a ^2[<] sentence a substitution using formulas containing only modular quantifiers. We 
define II2 o MODF[<] similarly. Straubing and Therien obtained the following analog of 
Theorem 15.81 

Theorem 5.13 ( [ST03] ). (E 2 o MODF[<]) n (n 2 o MODF[<]) = FO+MOD 2 [<]. 

Although Corollary 15.121 gives an exact algebraic characterization of FO + MOD2[<], 
it does not provide an effective way of testing if a given regular language is definable in 
this logic because the pseudovariety DA □ G so i is not known to be decidable. We have 
DA □ G so i C (DA □ G) n M so i and the latter two pseudovarieties are decidable but the 
containment is strict. Straubing and Therien show that DA □ G so i is decidable if and 
only if the smaller pseudovariety SL □ G so i is decidable [ST03| . The latter question is an 
outstanding open problem in combinatorial group theory with deep implications [M SWOlj . 

Example 5.14. 

Let us once again consider the language L = (ab)* . Recall that L's syntactic monoid is 
the six element monoid B2 = {1, a, b, ab, ba, 0} whose multiplication is specified by 06a = a, 
bob = b, aa = 0, bb = and xO = Ox = for all x E B2. We mentioned that B2 is aperiodic 
and gave an FO[<] sentence defining L. However, in B2 we have (ab) LO b(ab) UJ = ^ (a&) £J 
and so B2 DA. Hence, L is not definable in FO2 [<] . On the other hand one can show that 
B2 belongs to the pseudovariety DA □ G so i . While we could argue for this fact in algebraic 
terms, it is sufficient to show that the language L can be defined by an FO+MOD2[<] 
sentence. The language (ab)* consists of words of even length with a on every odd position 
and b on every even position so it is defined by the two-variable sentence 

(3 0mod2 x T) A Vx [{QaX 3 0mod2 y {y < %)) A ^ 3 lmod2 y {y < ^ 

This sentence is in fact Hi o MODFi[<]. 

In particular this example proves that (DA □ G so i) H A 7^ DA and so, somewhat 
counter- intuitively, there arc star- free languages, i.e. FO[<] definable languages, which are 
not definable in FC>2[<] but are definable in FO+MOD2[<]. On the other hand the 
syntactic monoid U presented in Example 15.101 is the smallest aperiodic monoid that does 
not lie in DA □ G so i and so £*ac*a£* is definable in FO[<] but not in FO + MOD 2 [<]. 

One can extend Corollary 15.121 to show that the pointed languages definable by a 
MODF2[<] formula are exactly the pointed languages recognized by solvable groups. This 
yields an interesting corollary: any two-variable FO+MOD2[<] sentence is equivalent to 
a two-variable sentence in which no existential or universal quantifier appears in the scope 
of a modular quantified Indeed this class of sentences is just F02[<] o MODF2[<] and, 
once again, the block-product/substitution principle yields 

L(F0 2 [<] o MODF 2 [<]) = L(DA □ G so i) = L(FO+MOD 2 [<]). 

In fact, it is possible to provide explicit rules for rewriting a two-variable FO+MOD sen- 
tence so that all modular quantifiers are pushed within the scope of existential and universal 

® Note that since the negation of a E2M sentence is not in general a £2[<] sentence, we must avoid 
negation in the definition of the class. 

^ Note that this is not true in the case of sentences with an unbounded number of variables. 
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quantifiers [ST03| but the detour through algebra avoids the technical complications of this 
construction. 

It is natural to ask whether one can symmetrically rewrite any FO+MOD2[<] sentence 
such that no modular quantifier lies in the scope of an existential or universal quantifier. 
In other words, we would like to understand the expressive power of the class of sentences 
MOD2[<] o FOF2[<]. Unfortunately we cannot directly use the block-product substitu- 
tion principle because we do not have an algebraic characterization of the class of pointed 
languages P(FOF2[<]). Rather, we choose to view this class of sentences as the union over 
all k of the classes 

MOD 2 [<] o FOFi[<] o . . . o FOFi[<] . 

V v ' 

k times 

Since P(FOFi[<]) = P(SL) it follows that a language L is definable by an FO+MOD 2 [<] 
sentence in which no modular quantifier appears in the scope of an existential or universal 
quantifier if and only if the syntactic monoid M{L) lies in one of the pseudovarieties 

S k = (. . . ((Ggoi □ SL) □ SL) . . . SL) □ SL; 

V » ' 

k times 

and is furthermore definable by an FO + MOD2[<] sentence in which no modular quantifier 
appears in the scope of any other quantifier if and only if M(L) lies in one of the 

T k = (. . . ((Ab □ SL) □ SL) . . . SL) □ SL . 

* v ' 

k times 

It is possible to show that for any k the pseudovarieties S^, T k are decidable using 
the notion of kernels of monoid morphisms [Til87j (see also [TW04| for an application to 
logic). In any case, we are once again more interested in deciding membership in the union 
of the Sk or the T^. Let DO be the pseudovariety of finite monoids satisfying the identity 

(xy) u (yx) w (xy) u = (xy) u . 

Lemma 5.15 ( [TT05bj ). Let DOnM so i and DOnAb denote the pseudovarieties consisting 
of monoids in DO whose subgroups are respectively solvable and Abelian. Then (J fc S^ = 
DO n M so i and \J k T k = DO n Ab. 

This immediately yields 

Corollary 5.16. A language L is definable by an FO+MOD2[<] sentence in which no 
modular quantifier appears in the scope of an existential or universal quantifier if and only 
if its syntactic monoid M lies in DOnM so i and definable by a sentence in which no modular 
quantifier appears in the scope of another quantifier if and only M lies in DO n Ab. 

Example 5.17. 

Let us return to Example 12.41 It can be explicitly shown that the syntactic monoid M(K) 
of K = (6*a6*a)*6S* lies in DAnG so i and, correspondingly, there exists an FO+MOD 2 [<] 
sentence defining K: 

3x [Q b xA3 0mod2 y [(y < x) A Q a y]) . 
The modular quantifier lies in the scope of the existential quantifier and we want to show 
that it cannot be pulled out. Indeed, by simple calculation one can see that M(K) contains 
elements {I, a, b, ab, ba, aba, 0} with multiplication given by aa = 1, bb = b, bob = b, abab = 
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and Os = sO = for all s. In particular aba and b = baa are idempotents. Choosing u = a 
and v = ba, we have 

(uvY(vu) w (uv) w = (aba) w (baa) w (aba) w = abababa = / {abaY = [uvf 

so M{K) violates the identity defining DO and K cannot be defined by an FO+MOD2[<] 
sentence in which the modular quantifiers lie outside the scope of the ordinary quantifiers. 

The same type of argument also shows that (ab)* cannot be defined by an FO + 
MOD2[<] sentence in which the modular quantifiers lie outside the scope of the ordinary 
quantifiers. 

Example 5.18. 

Consider for contrast the language of Example 12.51 we noted at the start of this section 
that the language L of words over {a, b, c}* such that the position holding the first a has a 
suffix containing an even number of c's is definable by the FO + MOD2[<] sentence 

3xVy [Q a x A {{y < x) => ^Q a y) A 3 0mod2 y {{x < y) A Q c y)). 

One can verify that the syntactic monoid of L lies in DO n Ab. In the above sentence, the 
modular quantifier appears within the scope of the leading existential quantifier but can in 
fact be pulled out: the sentence 

3 0mod2 x [Q c x A (3y {{y < x) A Q a y A (Vx [{y < x) =s> ^Q a x})} 

asserts that there are an even number of c's which appear after the first occurrence of a 
and thus also defines L. 

To conclude our discussion on two- variable sentences, note that although the successor 
relation is definable in FO[<] it is not possible in general to transform an FO + MOD2[<, S] 
sentence into an equivalent FO + MOD2[<] sentence. A precise characterization of the class 
F02[<,5'] in terms of syntactic semigroups is nonetheless given in [TW98j. 

5.2. Temporal Logic. 

The idea of using weakly-iterated block-products to characterize the expressive power 
of two- variable FO + MOD[<] sentences came originally from the study of temporal logics. 
Such logics are widely used in hardware and software verification because they are able to 
express properties of dynamic processes in a natural and intuitive way. 

A linear temporal logic formula (LTL) over the alphabet E is built from atomic formulas 
which are either one of the boolean constants T and F or one of the letters in E. We want 
to think of a word w satisfying the formula a at 'time' i if the ith letter of w is an a. More 
complex formulas are constructed from these atomic ones using boolean connectives and a 
certain set of temporal operators. We focus here on the cases where these operators are the 
unary operators W (eventually in the future) and Q (eventually in the past) or the binary 
operators U (until) and S (since). The terminology of course stresses the intended meaning 
of these operators and we can formally define the semantics of an LTL formula <f> over S 
for pointed words (w,p) with w € E* as follows. 

• For any (w, i) we have (w, i) \= T and (w, i) F; 

• For a € E we have (w, i) \= a if and only if Wi = a; 

• (w,i) \= Wcf) if there exists % < j < \w\ such that (w,j) \= 4>; 

• (w, i) \= if there exists 1 < j < i such that (w,j) (= 4>; 
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• (w,i) \= fillip if there exists i < j < \w\ such that (w,j) \= ift and (w,i') \= (ft for all 

i<i'< j; 

• (w,i) \= (fiSift if there exists 1 < j < i such that (w,j) \= ift and (w,i') (= (ft for all 
j <i' < i; 

Note that the LTL sentences 0(ft and T U (ft are equivalent and so the Until and Since 
operators are sufficient to obtain the full expressive power of LTL. Any LTL formula (ft 
naturally defines a pointed language = {(w,i) : (w,i) \= (ft}. We also associate to (ft 
the language = {w : (u>,0) |= (ft}. If $> is a class of LTL formulas, we similarly denote 
by L($) and P(3>) respectively the classes languages and pointed languages defined by a 
formula of 

As we mentioned earlier, Kamp |Kam6 8j showed that L(LTL) = L(FO[<]) and in fact 
P(LTL) = P(FOF[<]). The containment from left to right is rather easy to obtain by 
induction on the structure of the LTL formulas. The atomic LTL formula a defines the set 
of pointed words (w, i) having the letter a in position i and thus corresponds to the formula 
Q a x. Suppose by induction that for the LTL formulas (ft and ift we can construct FOF[<] 
formulas t(x),p(x) such that P(<^>) = P(r(x)) and = P(/)(x)) then the LTL formula 

4>U ift defines the same pointed language as 

rj{x) : 3yVz p(y) A ((x < y < z) =^> t(z)). 

The translation of the other three temporal operators can be obtained similarly. Note also 
that the structure of r](x) allows us to construct this formula using only three variables and 
so L(LTL) C L(F0 3 ). The inclusion L(FO[<]) C L(LTL) essentially amounts to showing 
L(FO[<]) C L(F0 3 [<]) |Kam68lHK89] . 

For two classes A, T of LTL formulas we denote as A o r the class of LTL formulas 
which are boolean combinations of formulas in T and formulas obtained from a A formula 
by replacing each occurrence of the atomic formula a by a formula <ft a G V. The block- 
product/substitution principle carries over to temporal logic: if there are pseudovarieties 
V,W such that L(A) = L(V), P(r) = P(W) and L(r) C L(VnW) then L(A o T) = 
L(VnW). 

The class of unary temporal logic formulas UTL is the subclass of LTL consisting 
of formulas constructed without the binary operators U, S. There is a natural hierarchy 
UTLi C UTL2 C . . . within UTL defined by the nesting depth of the operators. We 
clearly have 

UTL fc = UTLi o . . . o UTLi . 

v . ' 

k times 

Lemma 5.19. L(UTLi) = L(SL) and P(UTLi) = P(SL). 

Proof sketch. Any UTLi formula is a boolean combination of formulas of the form a, 
0a or 0a. The rest of the argument is similar to the proof of part 1 of Lemma 13.51 d 

Thus, the block-product/substitution principle insures: 
Corollary 5.20 QEVW021 ITWM I5T02 ]). 

For each k, L(UTL fc ) = L((. . . (SL □ SL) □ . . .) □ SL)). Moreover, 

L(UTL) = L(DA) = L(F0 2 [<]) = L(S 2 [<]) n L(n 2 [<]). 

Example 5.21. We argued in Example 15.91 that the syntactic monoid of the language 
L = T,*ac*d{b,c,d}* lies in DA and exhibited S 2 [<] and II 2 [<] sentences defining L (an 
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equivalent FC>2[<] sentence was also given in Example 15. 2h , In temporal terms, L can be 
described as the set of words which contain an a that has no other a in its future but has in 
its future an occurrence of d with the property that each b or d in the past of this occurrence 
of d contains an o in its future. 

<S> [a A (-®a) A (<$>(d A [-<8>((6 V d) A -<S>a)]))] . 

By contrast K = {a,b, c}* ac* a{a,b, c}* has a syntactic monoid which is aperiodic but 
outside of DA (Example I5.10j) and so K is definable in LTL but not in UTL. 

The Until/Since hierarchy {USH^^x) within LTL corresponds to the nesting depth 
of the Until and Since operators (the unary operators do not contribute to the depth of a 
formula). We set LTL = UTL. We have 

USH fc = UTL o USHi o . . . o USHi . 

V v ' 

k times 

The Until/Since hierarchy was introduced by Etessami and Wilke [EWOOJ who proved that 
the hierarchy was infinite. The algebraic characterization of the levels of the Until/Since 
hierarchy was given by Therien and Wilke [TW04J: 

Theorem 5.22. Let RB be the pseudovariety of monoids satisfying x 2 = x and xyxzx = 
xyzx. Then 

L(USH fc ) = L(((DA □ RB) □ RB) . . .) □ RB). 

V * ' 

k times 

Roughly speaking, the proof links pointed languages of USHi and pointed languages 
of P(RB). However a number of technical hurdles have to be overcome. This theorem also 
guarantees that the levels of the Until/Since hierarchy are decidable although the complexity 
of the algorithms provided in [TW04| is prohibitive. 

The two temporal operators next and previous are also often used in the construction of 
LTL sentences. The additional expressive power offered by these operators is closely linked 
to the extra power afforded by the successor numerical predicate in first-order sentences 
and, at least intuitively, this is not a major surprise. Standard methods allow algebraic 
characterizations of the expressive power of the various levels of the Until/Since hierarchy 
and of UTL when next and previous operators are available [TW981 ITW04| . 

In the context of software and hardware verification, 'future' operators U,^ ,next are 
more suited to express properties and the 'past' operators O and S are not so standard in 
LTL. Kamp in fact shows that the until operator U is sufficient to obtain the full expressive 
power of LTL = FO[<]. When future operators are the only ones available substitutions 
only allow additional information on the suffix of a given position and so the two-sided 
nature of the block-product makes it unsuited for the analysis. However, one can instead 
consider reverse semidirect products and obtain the correct analog of the principle. Cohen, 
Pin and Perrin used this idea to characterize the expressive power of unary future temporal 
logic [CPP93] and Therien and Wilke later extended the idea to characterize the levels of the 
Until hierarchy [TW02]. Baziramwabo, McKenzie and Therien also considered the extension 
of LTL in which new modular counting temporal operators are introduced [BMT99]. An 
early survey of Wilke provides an overview of the semigroup theoretic approach in the 
analysis of temporal logics [Wil01| . 
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6. Logic, Algebra and Circuit Complexity 

6.1. Boolean Circuits. 

We have so far considered only the case of first-order formulas with order (<) as the 
sole numerical predicate. When FO+MOD sentences have access to non-regular predicates, 
their expressive power is dramatically increased and they can provide logical characteriza- 
tions for a number of well-known classes of boolean circuit complexity. 

A boolean circuit C on n boolean variables w\, . . . , w n is a directed acyclic graph with a 
distinguished output node of outdegree 0. A node of in-degree is called an input node (or 
input gate) and is either labeled by one of the boolean constants 0, 1 or by some boolean 
literal Wi or wl. Any other node g of C (including the output node) is labeled by some 
symmetric boolean function f g chosen from some predetermined base. The most standard 
case has each inner node labeled either by the Or or the And function but we also consider 
the case where gates are labeled by the boolean function MOD m which is 1 if the sum of its 
inputs is divisible by m and is otherwise. Any gate g of a boolean circuit on n variables 
naturally computes a boolean function v g : {0, l} n — ► {0, 1}. If the gate g is an input node 
labeled by (resp. wl) then v g (w) = 1 if and only if wi = 1 (resp. W{ = 0). If g is an 
inner node then a gate g' is an input to g if there is a directed edge (g',g) in the graph C. 
Naturally, if g\ , . . . , g\. are the inputs of g we set 

v g( w ) = fg(v gi {w),... ,v 9k (w)). 

If out is the output node of C then the function computed by the circuit is C(w) = v ou t(w). 
The language accepted by the circuit is the set {w £ {0, l} n : C(w) = 1} of n-bit strings on 
which the circuit outputs 1. 

The depth d of a circuit C is the length of the longest path from an input node to 
the output node. The size s of C is the number of gates in C. We are also interested 
in considering circuits in which inputs Wi are not booleans but rather take values in some 
finite alphabet E. This can be handled either by using a binary encoding of £ or by labeling 
input nodes by functions uii = a for some a S S. The rest of our discussion is unaffected 
by these implementation details. 

By definition a boolean circuit can only process inputs of some fixed length n but we 
are interested in using circuits as computing devices recognizing languages in £*. This can 
be done by providing an infinite family C of circuits C = {C n } n >o where the circuit Cj 
processes inputs of length i. In this case, we define the size s{C) and the depth d(C) of a 
circuit family as functions of the input size. 

Note that for any subset K C N, the language {w : \w\ = k A k G K} can be recognized 
by a family of circuits of depth and size 1 since inputs of a given length are either all 
accepted or all rejected. If we do not impose any constraints on the constructibility of circuit 
families, boolean circuits are thus able to recognize undecidable languages. Uniformity 
restrictions on circuit families impose the existence of an (efficient) algorithm that computes 
some representation of the nth circuit C n of a family. We say that a family of circuits C 
is uniform if such an algorithm exists and furthermore say that C is P-uniform (resp. L- 
uniform) if there is a polynomial time (resp. logarithmic space) algorithm which on input l n 
constructs C n . An even more stringent requirement is that of DLOGTiME-uniformity which 
requires the existence of an algorithm which on input (n,i,j) computes in time 0(log |n|) 
the type of the ith and jth gates of C n and determines whether these gates are connected 
by a wire |BIS90j . 
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We define some classical circuit complexity classes: 

Definition 6.1. The boolean circuit complexity class non-uniform AC is the class of 
languages which are computable by a family C = {C n } n >o of circuits constructed with And 
and Or gates with depth d(C) = 0(1) and size s(C) = 0(n k ) for some k. 

Similarly, non-uniform CC° is the class of languages computable by families of circuits 
of bounded depth and polynomial size and constructed with gates MOD m for some m > 2. 
Non-uniform ACC° is the class of languages computable by families of circuits of bounded 
depth and polynomial size and constructed with gates And, Or and MOD m for some m > 2. 

Finally, non-uniform NC 1 is the class of languages computable by families of circuits of 
depth O(logn), polynomial-size and constructed with And and Or gates of fan-in 2. 

There are natural uniform versions of the above classes. By definition both AC and 
CC° are subclasses of ACC° C NC 1 . Moreover, L-uniform-NC 1 is a subclass of L (logspace). 
The containment AC°C ACC° is known to be strict because the parity function (i.e. the 
M0D2 function) cannot be computed by bounded depth And, Or circuits of sub exponential 
size |Ajt83j IFSSM ISmo86j . It is conjectured that CC° is also strictly contained in ACC° 
and, in particular, that the And function requires bounded depth MOD m circuits of super- 
polynomial size. Despite an impressive body of work in circuit complexity [A11971 Vol99], no 
such lower bound is known and even much weaker statements such as DLOGTlME-uniform 
CC° + NP still elude proof. 

These circuit classes have nice logical descriptions which were made explicit by Gure- 
vich, Lewis, Barrington, Immerman and Straubing [GL84, Imm87, BIS90, Str94]. 

Theorem 6.2. 

AC = FO (i.e. FO extended with all numerical predicates) and DLOGTIME-AC = FO[+, *]. 

CC° = MOD and dlogtime-CC° = MOD[+, *]. 

ACC° = FO+MOD and dlogtime-ACC° = FO+MOD[+, *]. 

Proof sketch. The statements about dlogtime uniformity are too technical to present 
succinctly [BIS 90] but it is rather straightforward to prove, for instance, that AC = FO. 
For the right to left containment, we need to build for any FO sentence <f> a non- uniform 
AC circuit family C that accepts exactly La- We assume without loss of generality that <f) 
is in prenex normal form: 

4> ■ QiXiQ 2 x 2 . . . Q k x k ip(xi, ...,x k ) 

where ip is quantifier free and each Qj is 3 or V. Circuit C n is obtained by using Or 
and And gates to respectively represent the existential and universal quantifiers. Each of 
those gates has fan-in n so that a wire into the gate representing QiXi represents one of 
the n possible values of X{. Finally, for any choice of values (x\, . . . ,x k ) we need to build 
subcircuits computing the value of tp(xi, . . . , £fc): the atomic formulas of the form Q a x% 
are evaluated using a query to the input variable Xj and the value of a numerical predicate 
R(xi x , ... ,X{ t ) can be hardwired into the nth circuit since the value of R only depends on 
the value of the Xj. and the input length n. Note that the size of the nth circuit built in 
this way is at most c • n fc+1 for some c > 1. 

To show AC C FO, we first normalize our circuit family C so that each C n of C is 
a tree of depth k which is leveled so that gates at level i in any circuit of the family are 
either all Or or all And gates. Moreover, we insure that every non-input gate has fan-in 
exactly n so that we can think of these n wires as being indexed by positions in the input. 



28 



P. TESSON AND D. THEPJEN 



By extension, any sequence of k input positions can be viewed as a path from the output 
gate back to some input gate. 

It is a simple exercise to show that the normalization process of our circuit family can 
be done so that the resulting family still has bounded depth and polynomial-size. The 
construction of an FO sentence defining the language accepted by C then follows naturally: 
if the family of circuits has depth k, the sentence has k quantifiers where existential and 
universal quantifiers are used to respectively represent levels of And gates and Or gates. 
We complete the construction by using a k + 1-ary numerical predicate R(i, x\, . . . , xj^) 
which is true if the path (x±, . . . , xpf) from the output gate back to the input queries the ith 
bit of the input. Note also that the non-uniformity of the family of circuits can be handled 
easily since we allow the value of the numerical predicates to depend on the length of the 
input word. □ 

Note that the polynomial-size restriction in the definition of AC , CC° and ACC° is 
in some sense built into this correspondence with first-order logic. Lautemann [KLPT06] 
further noted that when arbitrary numerical predicates are used, the restriction of FO, 
FO + MOD and MOD to two variables correspond to a linear-size restriction on the cor- 
responding circuits. 

Theorem 6.3. A language L is computable by a family of AC (resp. CC° ; ACC°) circuits 
of size 0{n) if and only if L is definable by a two-variable FO2 (resp. MOD2, FO+MOD2,) 
sentence with arbitrary numerical predicates. 

AC , CC° and ACC° (as well as a number of other important circuit complexity 
classes) also admit very interesting algebraic characterizations using the programs over finite 
monoids formalism. The idea first appeared in Chandra, Stockmeyer and Vishkin [CSV84] 
but was formalized and further developed by Barrington and Therien [Bar89[ IBT88| . A 
number of lower bounds for restricted classes of circuits can be obtained through this 
approach [BST90, BS94, IST06] . A detailed account of this line of work is beyond our 
scope but we refer the interested reader to Straubing's book [Str94] or one of the sur- 
veys [MFTaH EtrQi [TlIIl HT06] . 



6.2. Bounding the Expressive Power of FO+MOD: Partial Results. 

The logical description of circuit classes suggests a natural incremental approach to 
obtaining strong complexity separation results such as the strict containment of non-uniform 
CC° in ACC° or of non-uniform-ACC in Logspace. Such results amount to bounding 
the expressive power of FO + MOD or MOD and while this seems a deep mathematical 
challenge we can hope that for sufficiently simple classes of numerical predicates Af it is 
at least possible to bound the expressive power of FO + MOD [Af] and MOD [Af]. On one 
hand FO+MOD[fieg] = FO + MOD[<] contains only the regular languages with solvable 
monoids but, on the other hand, even bounding the expressive power of FO+MOD[+, *] is 
beyond the capabilities of current lower bound technology and so it makes sense to consider 
classes of numerical predicates with intermediate expressive power. 

The obvious target is of course Af = {+}. Ly nch proved that Parity is not expressible 
in FO[+] |Lyn82a Lyn82b| (see also [BIL + 0l] ). Building on work exposed in Libkin's 
book [Lib04], Roy and Straubing further showed that if p is a prime that does not divide 
q then the language Mod p is not expressible in FO+MODJ+] where the q subscript 
indicates that only quantifiers counting modulo q are used [RS06] . In later work, Behle 
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and Lange [BL06] translated the restriction of M to {+} into a uniformity restriction on 
circuits. Lautemann et al. [LMSVOlJ, Schweikardt [Sch05] and Lange [Lan04] all provided 
further evidence of the fairly weak expressive power of addition even in the case where FO 
is augmented by so-called counting quantifiers or majority quantifiers. 

In a somewhat different direction, Nurmonen [Nur00| and Niwihski and Stolboush- 
kin [SN97] considered logics equipped with numerical predicates of the form y = kx for some 
integer k and in particular establish that there is no FO+MOD g [<, {y = qx}] sentence that 
defines the set of words whose length is divisible by p where p does not divide q. 

If we are trying to exhibit a language L who cannot be defined in FO + MOD[A/] for 
some class TV of numerical predicates, it makes sense to choose L so that the predicates in J\f 
seem particularly impotent in a sentence defining L. This intuition is of course difficult to 
formalize but it led to the study of languages with a neutral letter. A letter e S S is said to 
be neutral for L C S* if for all u, v € X* it holds that uev E L 4^ uv G L. In other words e is 
neutral for L if e is equivalent to the empty word e under the syntactic congruence of L. At 
least intuitively, it is difficult to construct circuits to recognize languages having a neutral 
letter because they cannot rely on the precise location of the relevant (i.e. non-neutral) 
letters of their input. By the same token, access to arbitrary numerical predicates seems 
of little help to define these languages. Lautemann and Therien conjectured that every 
language with a neutral letter recognized in AC is in fact a star-free regular language. The 
so-called Crane-Beach conjecture, was in fact refuted in [BI L + 0lj : if C e denotes the class 
of languages with a neutral letter, then there is a language in (FO[+, *] D C e ) — FO[<]. 
Nevertheless, the same authors proved 

fo[+] nc e = fo[<] nc e 

and 

BC(Hi) nc e = 5C(Ei[<]) n c e 

where BC denotes the boolean closure. Let MOD p be the class of languages definable by a 
MOD sentence using only quantifiers that count modulo p for some prime p and arbitrary 
numerical predicates. Lautemann and the two current authors have shown [LTT06] that 

C e n MODp = MOD p [<] n C e . 

The neutral letter hypothesis has shown useful in other similar contexts, in particular 
to obtain superlinear lower bounds for bounded-width branching programs [BS95] and in 
communication complexity [RTT981 LTT05aJ [CKK±07l . 

6.3. The Circuit Complexity of Regular Languages. 

Regular languages are a fascinating case study in circuit complexity [BCST92"! CSOI, 
IPel92l [PST971 [Str94t ITT06] . As we mentioned earlier, one of the most celebrated results in 
complexity theory is the lower bound on the size of AC circuits computing the regular lan- 
guage parity. Moreover, from the results of [BT88, MPT9I] the main current conjectures 
on separations of circuit complexity classes amount to answering questions about the circuit 
complexity of specific regular languages. For instance, CC° is strictly contained in ACC° 
if and only if And is not in CC° and ACC° is strictly contained in the circuit class NC 1 if 
and only if regular languages with non-solvable syntactic monoids are not recognizable in 
ACC°. 
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Some of these questions can be recast in purely model-theoretic terms (BCST92 , Str92, 
Str94, STT93, P6192J. Intuitively, the only numerical predicates that can be of any sig- 
nificant use in defining regular languages are the regular predicates described at the end 
of Section 0] For example, [BCST92J used the fact that the MOD p -functions do not lie in 
AC to show that a regular language is definable in FO iff it is definable in FO [Reg] . If 1Z 
denotes the class of regular languages then the conjectured separation of ACC° from NC 1 
is equivalent to the statement 

TZ n FO+MOD = FO + MOD[i?e#]. 

and, similarly, CC° 7^ ACC° is equivalent to 

ft n MOD = MOD [.Res]. 

These equivalences are discussed in full detail in |Str94] and we simply sketch here the 
argument for the first of them. Assume that ACC° = NC 1 : since every regular language is 
in NC 1 , we have ftnFO+MOD = ftnACC = ftnNC 1 = K whereas FO + MOD[i?eg] = 
FO + MOD[<] contains only those regular languages whose syntactic monoid is solvable 
(Theorem 14. 4h . 

On the other hand, Barrington and Therien |BT88] showed that any regular language 
whose syntactic monoid is not solvable (and therefore not definable in FO+MOD [-Reg], 
is complete for NC 1 under very simple reductions known as non-uniform projections or 
programs. Therefore, if ACC° 7^ NC 1 then none of these languages lies in ACC° and 
Tl n FO + MOD = FO + MOD[<]. 

We can refine our questions about the circuit complexity of regular languages and ask 
how small the AC , CC° and ACC° circuits recognizing them can be. For AC , a surprising 
partial answer was provided by Chandra, Fortune and Lipton [CFL85] who show that any 
regular language computed by an AC circuit can in fact be computed by an AC°-circuit 
with only 0(ng~ 1 (n)) wires (and thus gates) for any primitive recursive function g. The 
result in fact extends to ACC°. The only regular languages known to be (and believed to be) 
in CC° are those definable in MOD2[-Reg] and, by Theorem 16. 31 these can all be recognized 
with circuits with 0{n) gates. It is tempting to further conjecture that any regular language 
which is not definable in F02[-Re<?] (resp. FO+MOD2[-Reg]) is in fact not definable in FO2 
(resp. FO+MOD2) and therefore requires superlinear-size AC (resp. ACC°) circuits. In 
other words, superlinear-size lower bounds for AC and ACC° circuits can conceivably be 
obtained through logical methods such as Ehrenfeucht-Frai'sse games showing that a given 
language is not FO2 or FO + MOD2 definable. 

Kouck, Pudlk and Therien considered the class of regular languages (with a neutral 
letter) which are recognizable by ACC° circuits with only 0{n) wires. 

Theorem 6.4. If L is a regular language with a neutral letter then L can be recognized by a 
family of ACC° circuits with 0(n) wires if and only if L € L(DO n Ab) if and only if L is 
definable by an FO+MOD2[<] in which no modular quantifier lies in the scope of another 
quantifier. 

The superlinear lower bound needed to obtain this theorem requires significant work and 
relies on an extension of deep combinatorial results of Pudlk on superconcentrators [Pud94] 
and on a linear lower bound }TT05a] on the communication complexity of regular languages 
which do not belong to L(DO n Ab). 

The upper bound is based on a result of Bilardi and Preparata [BP 90] which exhibits 
an AC circuit with n inputs x\, . . . , x n , 2n input gates and only 0(n) wires which on 
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input {0, l} n computes the Or function of each prefix x\. . .x% and each suffix . . . x n 
of the input. To build circuits with 0{n) wires recognizing languages in L(DO H Ab) it is 
convenient |TT05b] to make use of their logical characterization given by Corollary l5.16l any 
such language is definable by an FO + MOD2[<] sentence in which no modular quantifier 
appears in the scope of another quantifier. We illustrate the upper bound on an example. 

Example 6.5. 

Consider the language L = {b, c}*a({a, b}*c{a, b}*c{a, b}*)* which we already studied in 
Examples 12.51 [5TT1 and [5TT81 We saw that L can be defined by the FO+MOD2[<] sentence 

: 3 0mod2 x [Q c x A (3y (y < x) A (Q a y A Vx [(y < x) ^Q a x]))] 

We want to build a circuit C with O(n) wires verifying w \= <j>. As a first step, we build 
a subcircuit with 0(n) outputs which simultaneously computes for all 1 < y < n the 
boolean value of the subformula 

ip(y) ■ QaV A Mx [(y < x) ^Q a x}. 

This subformula is true at y if and only if y contains the first occurrence of a in w. Using 
Bilardi and Preparata's construction we can build a subcircuit with O(ra) wires and n 
outputs which simultaneously tells us for each y if the suffix following y contains an a and 
this allows the construction of CU. 

We can now use the same idea to build a subcircuit C v with O(n) wires and n output 
gates which uses the outputs of as inputs in order to compute simultaneously for all x 
the value of 

7](x) : Q c x A (3y [{y < x) A ip{y)}) . 
Finally, we complete the construction of our circuit C by feeding the n outputs of into 
a M0D2 output gate for C . 

This example has a straightforward generalization providing the upper bound for all 
regular languages in L(DO n Ab). We know from Theorem 16.31 that a language K is 
computable by a family of ACC° circuits with 0(n) gates if and only if it K is FO + MOD2 
definable given arbitrary numerical predicates but there is no similar logical characterization 
for the class of ACC° circuits with 0(n) wires. Theorem 16.41 indicates that the fine line 
separating 0(n) gates and 0(n) wires may be related to the ability or incapacity of pulling 
out modular quantifiers in FO + MOD2 sentences. 

7. Conclusion 

We believe that the block-product/substitution principle largely explains the success 
of semigroup theory in the analysis of the expressive power of fragments of FO + MOD[<] 
and LTL. In particular, we have tried to show that it underlies some of the most important 
results about the expressivity of fragments of FO + MOD[<] because it translates these 
logical questions into algebraic questions about decomposition of pseudovarieties through 
iterated block-products. 

Considerable efforts have been invested in the development of an analogous algebraic 
approach to regular tree-languages. There currently exists no known algorithm for de- 
ciding whether a tree language is definable in FO[<] where < is the descendant relation 
in trees. While most agree that this question will inevitably be solved using some alge- 
braic framework, it is rather unclear what the correct framework is. For instance, one 
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can define the syntactic monoid of a regular-tree language L as the transition monoid of 
the minimal tree-automaton for L. It is known that if a regular tree language is FO In- 
definable then its syntactic monoid is aperiodic but that condition is known to be insuf- 
ficient |Heu9H [PT93] . This strongly suggests that the combinatorial properties of regular 
tree-languages are not properly reflected in the algebraic properties of its syntactic monoid. 
Esik and Weil proposed to consider instead syntactic pre-clones. They obtain an analog of 
the block-product /substitution principle and show that a tree language is FO[<]-definable 
iff its syntactic preclone belongs to the smallest pseudovariety of pre-cl ones co ntaining a 



very simple pseudovariety of preclones and closed under block product [EW05|. Unfortu- 
nately, too little is known about this pseudovariety to make this characterization effective. 
That the block-product/substitution principle generalizes to more complex settings is not 
much of a surprise since it simply provides a scheme to reformulate a logical question into 
algebraic terms but there are no known preclone analogs of the block-product decomposition 
results that exist for monoids and this impedes progress. 

There are decidability results for subclasses of FO[<] definable tree languages (e.g. 
[BWQ4]), some of which rely on the study of tree algebras proposed by Wilke |Wil96| . This 
first led to an effective algebraic characterization of frontier testable tree languages [Wil96j 
and, more recently, Benedikt and Segoufin used a similar framework to provide an effective 
algebraic characterization^! of tree languages definable in FOfS 1 ] (where S is the child 
relation) }BS05j . The recent results of Bojahczyk et al. on pebble automata [BSSS06] also 
seem to be tightly connected to some variant of block-products although the authors do not 
explicitly give an algebraic interpretation of their work. 

We focused in this survey on the case where logical sentences are interpreted over 
finite words. However, Biichi's Theorem also holds for infinite words: an w-language is 
w-regular if and only if it can be defined by an MSO[<]-sentence. The algebraic theory of 
w-regular languages is well-developed although not as robust as the one presented here for 
the case of finite words [PP04| . Still, the class of w-languages definable in FO[<] and LTL 
are exactly the starfree w-languages [Tho79, SPW91, Coh91] which, in turn, are exactly 
those recognizable by aperiodic cu-semigroups [Per83]. Because the results for finite words 
often extend to the infinite case [Lib041 IPP041 IPin96| IPinOH ITho97j . it is tempting to 
overlook the related caveats. It would be interesting to specifically consider how the block- 
product/substitution principle extends to the case of infinite words to unify the existing 
results. The work of Carton [CarOO] probably provides all the necessary tools for this 
investigation. 

More generally, as Weil clearly demonstrates in [Wei04| . there are numerous extensions 
of the algebraic point of view on finite automata and regular languages which have proved to 
be successful in the analysis of more sophisticated machines and more sophisticated logical 
formalisms. These include regular sets of tra ces [DR95| , series-parallel pomsets [Kus03l 



ILWOOj and graphs, as well as timed automata [BDM+061 IFK031 IMP041 IBPT03j . 
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Segoufin has recently acknowledged that the characterization given in the conference paper is incorrect, 
but the decidability result still stands Seg and a corrected manuscript is available from Segoufin's home 
page. 
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